class: center, middle, inverse, title-slide # Alluvial Diagrams ### Ludmila Janda ### 2021 --- <style type="text/css"> .remark-slide-content { .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } </style> # What are Alluvial Diagrams? <img src="data:image/png;base64,#images/alluvial-fans.jpeg" width="600" /> --- # What are Alluvial Diagrams? <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-3-1.png" width="864" /> --- # When to Use Alluvial Diagrams -- - When showing groups moving from one state of being to another -- - When you have a reasonable number of groups and states of being --- # Terminology - **Axes** the different states that the graph shows movement between <img src="data:image/png;base64,#images/axes.png" width="600" /> --- # Terminology - **Stratum** the groupings at each state/axis <img src="data:image/png;base64,#images/stratum.png" width="600" /> --- # Terminology - **Flow** the movement from one state/axis to another - **Alluvium** the movement across all states/axes (all flows together) - **Lode** intersection of one alluvium and one stratum <img src="data:image/png;base64,#images/alluvium-flow-lode.png" width="500" /> --- # Terminology - **Axes** the different states that the graph shows movement between - **Stratum** the groupings at each state/axis - **Flow** the movement from one state/axis to another - **Alluvium** the movement across all states/axes (all flows together) - **Lode** intersection of one alluvium and one stratum <img src="data:image/png;base64,#images/alluvium-flow-lode.png" width="400" /> --- # Raw Data ```r sim_data_pre_post <- read_csv(here::here("sim_data_pre_post.csv")) %>% select(student, unit_title, assessment, score_level) head(sim_data_pre_post, 5) ``` ``` ## # A tibble: 5 x 4 ## student unit_title assessment score_level ## <dbl> <chr> <chr> <dbl> ## 1 1 Unicorn Traits and Reproduction pre 1 ## 2 2 Unicorn Traits and Reproduction pre 1 ## 3 3 Unicorn Traits and Reproduction pre 2 ## 4 4 Unicorn Traits and Reproduction pre 1 ## 5 5 Unicorn Traits and Reproduction pre 1 ``` --- # Data Cleaning - Reshape data to wide by axes ```r sim_data_pre_post <- read_csv(here::here("sim_data_pre_post.csv")) %>% dplyr::select(student, unit_title, assessment, score_level) %>% dplyr::mutate(score_level = factor(score_level), assessment = factor(assessment, levels = c("pre", "post"))) %>% tidyr::pivot_wider(names_from = assessment, values_from = score_level) head(sim_data_pre_post, 5) ``` ``` ## # A tibble: 5 x 4 ## student unit_title pre post ## <dbl> <chr> <fct> <fct> ## 1 1 Unicorn Traits and Reproduction 1 4 ## 2 2 Unicorn Traits and Reproduction 1 3 ## 3 3 Unicorn Traits and Reproduction 2 3 ## 4 4 Unicorn Traits and Reproduction 1 4 ## 5 5 Unicorn Traits and Reproduction 1 4 ``` --- # Using `to_lodes_form()` ```r sim_data_pre_post <- sim_data_pre_post %>% ggalluvial::to_lodes_form(key = "assessment", axes = 3:4) head(sim_data_pre_post, 5) ``` ``` ## # A tibble: 5 x 5 ## student unit_title alluvium assessment stratum ## <dbl> <chr> <int> <fct> <fct> ## 1 1 Unicorn Traits and Reproduction 1 pre 1 ## 2 2 Unicorn Traits and Reproduction 2 pre 1 ## 3 3 Unicorn Traits and Reproduction 3 pre 2 ## 4 4 Unicorn Traits and Reproduction 4 pre 1 ## 5 5 Unicorn Traits and Reproduction 5 pre 1 ``` --- # Plot Elements `ggalluvial::geom_flow(alpha = 0.5)` - gives you flows from one axis to the next (useful to make somewhat transparent with alpha) `geom_alluvium(aes(fill = final_outcome)` - gives you the alluvia across all the axes (useful to pick how to set the fill) `ggalluvial::geom_stratum()` - gives you the strata --- # Plot Code - Add Strata ```r sim_data_pre_post %>% ggplot(aes(x = assessment, # categorical axis var (pre or post) stratum = stratum, # categorical group var (score level) alluvium = alluvium)) + # individual/unit (student) ggalluvial::geom_stratum() ``` --- # Plot Code - Add Strata <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-12-1.png" width="504" /> --- # Plot Code - Add Flows ```r sim_data_pre_post %>% ggplot(aes(x = assessment, stratum = stratum, alluvium = alluvium)) + ggalluvial::geom_stratum() + ggalluvial::geom_flow(alpha = 0.5) ``` --- # Plot Code - Add Flows <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-14-1.png" width="504" /> --- # Plot Code - Add fill ```r sim_data_pre_post %>% ggplot(aes(x = assessment, stratum = stratum, alluvium = alluvium, fill = stratum)) + ggalluvial::geom_stratum() + ggalluvial::geom_flow(alpha = 0.5) ``` --- # Plot Code - Add fill <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-16-1.png" width="864" /> --- # Plot Code - Reverse Stratum ```r sim_data_pre_post %>% ggplot(aes(x = assessment, stratum = fct_rev(stratum), alluvium = alluvium, fill = fct_rev(stratum))) + ggalluvial::geom_stratum() + ggalluvial::geom_flow(alpha = 0.5) ``` --- # Plot Code - Reverse Stratum <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-18-1.png" width="864" /> --- # Plot - Facet Wrap ```r library(ggalluvial) sim_data_pre_post %>% ggplot(aes(x = assessment, stratum = fct_rev(stratum), alluvium = alluvium, fill = fct_rev(stratum), label = fct_rev(stratum))) + ggalluvial::geom_stratum() + ggalluvial::geom_flow(alpha = 0.5) + facet_wrap(~unit_title, scales = "free_y") ``` --- # Plot - Facet Wrap <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-20-1.png" width="864" /> --- # Plot - Add colors ```r library(ggalluvial) sim_data_pre_post %>% ggplot(aes(x = assessment, stratum = fct_rev(stratum), alluvium = alluvium, fill = fct_rev(stratum), label = fct_rev(stratum))) + ggalluvial::geom_stratum() + ggalluvial::geom_flow(alpha = 0.5) + facet_wrap(~unit_title, scales = "free_y") + scale_fill_manual("Score Level", values = c("1" = "#BFBFBF", "2" = "#55A5CC", "3" = "#30779C", "4" = "#004C6D")) ``` --- # Plot - Add colors <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-22-1.png" width="864" /> --- # Plot - Beautify ```r library(ggalluvial) sim_data_pre_post %>% ggplot(aes(x = assessment, stratum = fct_rev(stratum), alluvium = alluvium, fill = fct_rev(stratum), label = fct_rev(stratum))) + ggalluvial::geom_stratum() + ggalluvial::geom_flow(alpha = 0.5) + facet_wrap(~unit_title, scales = "free_y") + scale_fill_manual("Score Level", values = c("1" = "#BFBFBF", "2" = "#55A5CC", "3" = "#30779C", "4" = "#004C6D")) + scale_y_continuous(labels = comma) + labs(x = "") + theme_minimal() + theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "grey"), axis.text.x = element_text(size = 18), axis.text.y = element_text(size = 14), strip.text.x = element_text(size = 16), legend.position = "bottom", legend.title = element_text(size = 16), legend.text = element_text(size = 16)) ``` --- # Plot - Beautify <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-24-1.png" width="864" /> --- # Plot with `geom_alluvium()` <img src="data:image/png;base64,#alluvial_slides_files/figure-html/ggalluvial2-1.png" width="1008" /> .tiny[ Source: https://mdneuzerling.com/post/my-data-science-job-hunt/ ] --- # Raw Data ```r job_outcomes <- read_csv(here::here("job_outcomes.csv")) head(job_outcomes, 10) ``` ``` ## # A tibble: 10 x 5 ## job_id contact `1st stage` `2nd stage` outcome ## <dbl> <chr> <chr> <chr> <chr> ## 1 1 job site ghosted <NA> <NA> ## 2 2 job site ghosted <NA> <NA> ## 3 3 recruiter coffee no role <NA> ## 4 4 LinkedIn ghosted <NA> <NA> ## 5 5 network coffee ghosted <NA> ## 6 6 network phone call interview withdrew ## 7 7 LinkedIn phone call withdrew <NA> ## 8 8 LinkedIn ghosted <NA> <NA> ## 9 9 internal coffee interview rejected ## 10 10 LinkedIn phone call interview rejected ``` --- # Data Cleaning ```r job_outcomes <- read_csv(here::here("job_outcomes.csv")) job_outcomes %>% mutate(final_outcome = coalesce(outcome, `2nd stage`, `1st stage`)) %>% head(10) ``` ``` ## # A tibble: 10 x 6 ## job_id contact `1st stage` `2nd stage` outcome final_outcome ## <dbl> <chr> <chr> <chr> <chr> <chr> ## 1 1 job site ghosted <NA> <NA> ghosted ## 2 2 job site ghosted <NA> <NA> ghosted ## 3 3 recruiter coffee no role <NA> no role ## 4 4 LinkedIn ghosted <NA> <NA> ghosted ## 5 5 network coffee ghosted <NA> ghosted ## 6 6 network phone call interview withdrew withdrew ## 7 7 LinkedIn phone call withdrew <NA> withdrew ## 8 8 LinkedIn ghosted <NA> <NA> ghosted ## 9 9 internal coffee interview rejected rejected ## 10 10 LinkedIn phone call interview rejected rejected ``` --- # Plot with `geom_alluvium()` ```r job_outcomes <- read_csv(here::here("job_outcomes.csv")) job_outcomes %>% mutate(final_outcome = coalesce(outcome, `2nd stage`, `1st stage`)) %>% ggalluvial::to_lodes_form(key = "contact", axes = 2:5) %>% ggplot(aes(x = contact, stratum = stratum, alluvium = alluvium, label = stratum)) + ggalluvial::geom_alluvium(aes(fill = final_outcome), color = "darkgrey", na.rm = TRUE) + ggalluvial::geom_stratum(na.rm = TRUE) + geom_text(stat = "stratum", na.rm = TRUE, size = 5) + theme_minimal() + theme(text = element_text(size = 20), legend.position = "bottom") + labs(x = "", fill = "Final Outcome", caption = "David Neuzerling @mdneuzerling") + scale_fill_manual(values = c("ghosted" = "#F0E442", "no role" = "#CC79A7", "withdrew" = "#0072B2", "rejected" = "#D55E00", "offer" = "#009E73")) ``` --- # Plot with `geom_alluvium()` <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-28-1.png" width="1008" /> --- # Contrast with `geom_flow()` <img src="data:image/png;base64,#alluvial_slides_files/figure-html/unnamed-chunk-29-1.png" width="1008" /> --- # Examples <img src="data:image/png;base64,#images/ex-mayoral-race.png" width="700" /> .tiny[ Source: https://www.nytimes.com/interactive/2021/06/22/us/elections/results-nyc-mayor-primary.html ] --- # Examples <img src="data:image/png;base64,#images/ex-economist-immigration.png" width="500" /> .tiny[ Source: http://www.economist.com/blogs/graphicdetail/2015/05/daily-chart-1?fsrc=scn/tw/te/bl/ed/seeking_safety&%3Ffsrc%3Dscn/=tw/dc ] --- # Examples - Better as a mosaic? <img src="data:image/png;base64,#images/ex-titanic.png" width="600" /> .tiny[ Source: https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html ] --- # Examples - Including gender in this graph gives us very little information <img src="data:image/png;base64,#images/ex-cancers.png" width="600" /> .tiny[ Source: https://digitalsplashmedia.com/2014/06/visualizing-categorical-data-as-flows-with-alluvial-diagrams/ ] --- # Examples - Way too much going on here! <img src="data:image/png;base64,#images/ex-energy.png" width="400" /> .tiny[ Source: https://digitalsplashmedia.com/2014/06/visualizing-categorical-data-as-flows-with-alluvial-diagrams/ ] --- # In Summary - Ensure that your data fits the alluvial specifications -- - Reshape the data to wide by axes if needed for `to_lodes_form()` -- - Consider whether you want to highlight flows or alluvium -- - Pay attention to your use of color (both in terms of variable type and whether your are coloring from starting state or ending state) -- - HAVE FUN! --- # Resources ggalluvial CRAN vignette: https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html defining and taxonomizing alluvial diagrams: https://corybrunson.github.io/2019/09/13/flow-taxonomy/ Alternative way to make alluvial charts: https://ggforce.data-imaginist.com/reference/geom_parallel_sets.html